Who are we wrt R?
Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…
2025-01-16
Who are we wrt R?
Wherever you are, you’re not alone! As we begin learning R (or learning new things in R), remember…
R is the computational engine; RStudio is the interface
Projects allow RStudio to leave notes for itself (e.g., history), will always start a new R session when opened, and will always set the working directory to the Project directory.
Create a system for organizing the objects in this project!
Functions are the “verbs” that allow us to manipulate data. Packages contain functions, and all functions belong to packages.
R comes with about 30 packages (“base R”). There are over 10,000 user-contributed packages; you can discover these packages online in Comprehensive R Archive Network (CRAN), with more in active development on GitHub.
To use a package, install it once
tidyverse (or a different package name) then click on Install.install.packages("tidyverse")In each new R session, you’ll have to load the package if you want access to its functions: e.g., type library(tidyverse).
# demarcates code comments<- is the assignment operator, how we name new objects in the R environmentR has two native data formats:
readRDS("path/filename.RDS"), saveRDS(object, file = "path/filename.RDS")load("path/filename.Rdata"), save(object1, object2, file = "path/filename.RData"), save.image("path/filename.RData")You can import any data format if you know the right command/(package):
read.csv (base R), read_csv (tidyverse)read_excel (readxl)read.dta (foreign), read_dta (haven)Primary data types include numeric, integer, logical, and character; plus factors.
Download R materials from today’s canvas page!
Artwork by @allison_horst
Examining data:
names()head() and tail()str(); glimpse() (dplyr equivalent)summary()These (base R) commands will operate an the full object (all variables/columns in a data frame). To access a specific variable/column, use the $ operator: df$varname.
Part of the the tidyverse, dplyr is a package for data manipulation. The package implements a grammar for transforming data, based on verbs/functions that define a set of common tasks.
dplyr functions are for data frames.
dplyr functions is always a data frameselect() - extract variablesselect(.data, var1, var2, var3)
select() helpers include
select(.data, var1:var10): select range of columnsselect(.data, -c(var1, var2)): select every column butselect(.data, starts_with("string")): select columns that start with… (or ends_with(“string”))select(.data, contains("string")): select columns whose names contain…filter() - extract rowsfilter(.data, var == value)
| Logical tests | Boolean operators for multiple conditions |
|---|---|
x < y: less than |
a & b: and |
y >= y: greater than or equal to |
a | b: or |
x == y: equal to |
xor(a,b): exactly or |
x != y: not equal to |
!a: not |
x %in% y: is a member of |
|
is.na(x): is NA |
arrange() - reorder rowsarrange(.data, var) arrange(.data, desc(var))
count() - tabulate values of variablescount(.data, var)
The pipe (%>%) allows you to chain together functions by passing (piping) the result on the left into the first argument of the function on the right. It allows us to call a series of functions in sequence (read the pipe as “and then…”).
dataframe %>% filter(var1 > 0) %>% select(var1, var2, var3)
%>%